feat: add configurable token cache for customer flow#133
Conversation
prashantrakheja
left a comment
There was a problem hiding this comment.
changes look good, some minor feedback, in particular i would like to remove usage of app_tid from token caching because its an optional parameter and likely to be removed going forward..
| Returns: | ||
| System-scoped access token. | ||
| """ | ||
| cached = cache.get_system_token(app_tid) |
There was a problem hiding this comment.
i am not sure if app_tid would be needed for fetching the tokens, is there any other way to do this?
| logger.debug("Loaded %d tool(s) from %s", len(server_tools), dep.ord_id) | ||
| except Exception: | ||
| except Exception as exc: | ||
| unwrapped = _unwrap_exception_group(exc) |
There was a problem hiding this comment.
do u feel the exception block is unusually big? can we put this stuff inside a method?
There was a problem hiding this comment.
Updated logic here - request system_token when needed. Added helper closure to refetch token
| ) | ||
| continue | ||
| except Exception: | ||
| logger.exception( |
There was a problem hiding this comment.
i feel we should not swallow the exception here, if there is a failure we should fail-fast, what's ur take?
There was a problem hiding this comment.
I'll extract logic to a method, good catch.
Regarding exception - it basically follows what was implemented before - log with logger.exception and continue
| self._config = config or ClientConfig() | ||
| self._token_cache = _TokenCache(self._config) | ||
|
|
||
| @record_metrics(Module.AGENTGATEWAY, Operation.AGENTGATEWAY_CLEAR_TOKEN_CACHE) |
There was a problem hiding this comment.
this operation is "internal" to the functioning of the sdk, do u feel it makes sense to capture metrics for this?
There was a problem hiding this comment.
I didn't see value exposing this method
There was a problem hiding this comment.
Removed. My assumption was that consumers of SDK should be able to set/unset/cleanup cache should they require it. I had an impression that it's usual for SDKs :) removed as considered unwanted
| self._config = config or ClientConfig() | ||
| self._token_cache = _TokenCache(self._config) | ||
|
|
||
| @record_metrics(Module.AGENTGATEWAY, Operation.AGENTGATEWAY_CLEAR_TOKEN_CACHE) |
There was a problem hiding this comment.
I didn't see value exposing this method
| credentials: CustomerCredentials, | ||
| grant_type: str, | ||
| timeout: float, | ||
| config: ClientConfig, |
There was a problem hiding this comment.
Isn't timeout part of client config?
There was a problem hiding this comment.
Yes, though it's not needed here. Passing cache instead now since it's available and has required information
| @@ -25,6 +25,8 @@ | |||
| IntegrationDependency, | |||
There was a problem hiding this comment.
Shouldn't we also add token cache for LoB flow to keep consistency?
There was a problem hiding this comment.
I had an impression that since LoB flow goes through destination service it would handle the token request and caching. However seems like it's not the case. I'll add caching to lob flow
Description
Add in-process token cache for customer agent flow. Previously every
list_mcp_tools()/call_mcp_tool()call fetched fresh IAS token via mTLS - unnecessary latency in agentic loops.Changes:
_token_cache.py(new):_TokenCache- TTL + LRU eviction for system tokens (key:client_id) and user tokens (key:sha256(user_jwt+"|"+client_id)[:16]). Expiry is fromexpires_in,id_tokenexp claim, or fallback TTL._customer.py:get_system_token_mtls/exchange_user_tokento consult/populate cache. 401 response from MCP server → invalidate stale token + retry once.agw_client.py:AgentGatewayClientowns_TokenCacheconfig.py: 4 newClientConfigfields added -token_expiry_buffer_seconds(60 s),max_system_token_cache_size(10 s),max_user_token_cache_size(10 s),fallback_token_ttl_seconds(300 s).LoB flow unaffected - delegates to BTP Destination Service.
Type of Change
Please check the relevant option:
How to Test
Describe how reviewers can test your changes:
Checklist
Before submitting your PR, please review and check the following:
Breaking Changes
None. Cache internal to
AgentGatewayClient- existingcreate_client()calls get caching automatically.ClientConfignew fields all have defaults.Additional Notes
Thread safety: GIL makes individual
OrderedDictops atomic, but check-then-set is not. Concurrent coroutines on same key may both miss and both fetch - redundant requests, not corruption. Acceptable for agentic loop use case.401 retry: Both
get_mcp_tools_customerandcall_mcp_tool_customerinvalidate + retry once on 401, handling server-side revocation before token expiry.